/ iC? O L^i/^ 



1 

GENOMIC SEQUENCE OF NGR234 SYMBIOTIC 
PLASMID, ITS GENE MAP, AND ITS USE IN 
DIAGNOSTICS AND GENE TRANSFER IN AGRICULTURE 

5 

TECHNICAL FIELD 

This invention relates to a symbiotic plasmid of the 
broad host-range Ehizobium sp. NGR234 and its use* In 
10 particular, this invention relates to the isolation and 
analysis of the complete sequence of the NGR234 symbiotic 
plasmid pNGR234a, and the open reading frames (ORFs) 
identifiable therein as well as the proteins expressible 
from said ORFs. 

15 

BACKGROUND OF THE INVENTION 

Together with carbon, hydrogen and oxygen, nitrogen 

20 is one of the essential components in organic chemistry. 
Although it is present in vast quantities in the atmosphere, 
nitrogen in its diatomic form Ng remains unassimilable by 
living organisms. The nitrogen cycle begins by the fixation 
of nitrogen into ammonia which is chemically more reactive 

25 and can be assimilated into the food chain. A large 
fraction of the total nitrogen fixed every year is produced 
by microorganisms. Among these, the soil bacteria of the 
genera Azorhizobium , Bradyrhizobium , Sinorhizobium and 
Ehizobium, generally referred to as rhizobia, fix nitrogen 

30 in symbiotic associations with many plants from the 
Legrujninosae family. This highly specific interaction leads 
to the formation of specialized root-, and in the case of 
Azorhizobium, stem- structures called nodules. It is within 
these nodules that rhizobia differentiate into bacteroids 

35 capable of fixing atmospheric nitrogen into ammonia. In 
turn, ammonia diffuses into the vegetal cells and sustains 
plant growth even under limiting nitrogen conditions. 
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The Rhizobium-legniaB interaction presents many 
interesting features. Obviously, the possibility of using 
this syxabiosis as an "environmentally friendly" way to 
provide some of the most important world crops (such as 
5 soybean, bean and many other legumes) with fixed nitrogen 
without using nitrate-rich fertilizers, has important 
economic consequences* It is also an ideal model to study 
a non-pathogenic interaction between bacteria and a highly 
developed, multicellular organism such as the host plant. 

10 Furthermore, the various steps involved in the establishment 
of a functional nitrogen symbiosis, which include some 
dramatic morphological changes as well as processes of 
cellular differentiation, require a complex exchange of 
molecular signals. Despite many decades of studies, it is 

15 only recently that the i^hizoJbium-legume interaction has been 
partially understood at the molecular level. The 
establishment of a functional symbiosis can be divided into 
two major steps as follows. 

20 (A) Rhizosphere ecology and nodulation: 

Rhizobia are soil bacteria that proliferate in the 
rhizosphere of compatible plants, taking advantage of the 
many compounds released by plant roots. In return it has 

25 been shown that the presence of rhizobia in the rhizosphere 
reduces susceptibility of plants to many root diseases. In 
the case of low nitrogen levels in the soil, compatible 
rhizobia can interact with host plants and start the 
nodulation process (Long, 1989; Fellay et al., 1995; van 

30 Rhijn and Vanderleyden, 1995) . Molecular signalling between 
the two partners begins with the release by the plant of 
phenolic compounds (mostly flavonoids) that induce the 
expression of nodulation genes (referred to as nod, nol and 
noe genes) . The NodDl gene product appears to be the 

35 central mediator between the plant signal and nodulation 
gene induction (Bender et al., 1988) . It is modified by the 
binding of flavonoids and acts as a positive regulator on 
the expression of the remaining nodulation genes. Among 
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them, the nodABC loci encode products responsible for the 
synthesis of the core structure of lipooligosaccharides 
called Nod factors (Relie et al., 1994) • More nodulation 
genes are involved in strain-specific modifications of the 
5 Nod factors as well as in its secretion. it seems 
established now that variability in the structure of Nod 
factors may play a significant role in the determination of 
the host-range of a given Rhizobium strain, that is in its 
ability to efficiently nodulate different legumes. For 
10 example, the strain Rhizobium meliloti can only nodulate 
MedicagOj. Melilotus and Trigonella ssp. , whereas -Rhi^ojbium 
sp. NGR234 can symbiotically interact with more than 105 
different genera of plants, including the non-legume 
Parasponia andersonii. 

15 

The structure of many Nod factors, their isolation 
from Rhizobium strains and their commercial application in 
agriculture have been described (NodNGR-Faktoren: Relic et 
al., 1994; WO 94/00466; NodRm-Faktoren: WO 91/15496). 

20 Secreted Nod factors act in turn as signal molecules that 
allow rhizobia to enter young root hairs of a host plant, 
and induce root-cortical cell division that will produce the 
future nodule. Invaginated rhizobia progress towards the 
forming nodule within infection threads that are synthesized 

25 by the plant cells. Bacteria are then released into the 
cytoplasm of dividing nodule cells where they differentiate 
into bacteroids capable of fixing atmospheric nitrogen. 

With respect to regulation of the nodulation genes, 
30 other regulatory genes with similarities to nodDl (genes 
that belong to the lysR family) have been identified in 
various strains (Davis and Johnston, 1990). The function of 
these genes, called nodD2 , nodD3 or syrM, is only partially 
understood. Some nodD genes have been described (WO 
35 94/00466; CA 1314249; WO 87/07910; US 5023180). Also, 
recombinant DNA molecules including the consensus sequence 
of the promoters of /3odI>I -regulated genes, called nod-boxes 
(Fisher and Long, 1993), have been disclosed (US 5484718; 
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US 5085588) . Finally, recombinant plasmids with the nodABC 
genes or, in one case (Bradyrhizobium japonicum) , a sequence 
influencing host specificity have been disclosed (US 
5045461; US 4966847). 

(B) symbiotic nitrogen fixation: 



Inside the nodules, rhizobia differentiate into 
bacteroids that express the enzymatic complex (nitrogenase) 

10 required for the reduction of atmospheric nitrogen into 
^ ammonia. The nitrogenase is encoded by three genes nifH, 
nifD and nifK which are well conserved in nitrogen fixing 
organisms (Badenoch- Jones et al., 1989). Many additional 
loci are necessary for functional nitrogenase activity. 

15 Those originally identified in Klebsiella pneumoniae are 
known as nif genes, whereas those found only in Rhizobium 
strains are described as fix genes (Fischer, 1994) . Some of 
these gene products are required for the biosynthesis of 
cof actors, the assembly of the enzymatic complex or play 

20 regulatory and different accessory roles (oxygen- limited 
respiration, etc.). Many of these genes are less conserved 
among the various rhizobial strains and in some cases their 
function is still not fully understood. The high 
sensitivity of the nitrogenase complex to free oxygen 

25 requires a very strict control of most nif and fix gene 
expression. In this respect, the FixL, FixJ, FixK, Nif A and 
RpoN proteins have been identified in representative 
Rhizobium species as the major regulatory elements that, in 
microanaerobic conditions, activate the synthesis of the 

30 nitrogertase complex (Fischer, 1994). Recombinant DNA 
molecules containing nif genes /promoters have been 
disclosed: nifH promoters of B. japonicum (US 5008194), nifH 
and nifD promoter of R. japonicum (EP 164245), nifA of B. 
japonicum and R. meliloti (EP 339830) , nifHDK and hydrogen- 
35 uptake {hup) genes of R. japonicum (EP 205071) . 

Many more genetic determinants play a significant role 
in the i?hizojbiujn-legume symbiosis. Genes (exo, Ips and ndv 
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genes) involved in the production of extracellular 
polysaccharides (EPS) , lipopolysaccharides (LPS) and cyclic 
glucanes of rhizobia play an essential role in the symbiotic 
interaction (Long et al . , 1988; Stanfield et al., 1988), 
5 Mutation in these genes negatively influences the 
development of functional nodules. In this respect, some 
exopolysaccharides of the NGR234 derivative strain ANU280, 
have been disclosed (WO 87/06796). Although Nod factors 
seem to play a key role in the nodulation process, 

10 experimental data indicate that other signal molecules 
produced by the bacterial symbionts are required for 
functional symbiosis and may play a role in coordinating 
various steps such as the controlled invasion process, the 
release of rhizobia from the infection thread into the plant 

15 cell cytoplasm, the bacteroid differentiation process, etc. 
Moreover, the need for rhizobia to survive in the 
rhizosphere and to compete adequately with other 
microorganisms requires many more unidentified genes that, 
although they may not be characterised as proper symbiotic 

20 loci, do affect the efficiency of the various strains to 
induce functional nitrogen fixing symbiosis in field 
conditions. Finally, in our view genetic engineering of 
improved rhizobial strains cannot be pursued without a more 
extended knowledge of the structure and complexity of the 

2 5 Rhizobium symbiotic genome. 

In this respect we decided to determine the complete 
DNA sequence of a symbiotic plasmid of Rhizobium sp. NGR234. 
In contrast to Bradyrhizobium and A^orhizoJbiujn that carry 

3 0 symbiotic genes on large chromosomes (ca. 8 Mbp) and to 

R. meliloti that harbours two very large symbiotic plasmids 
of 1.4 and 1.6 Mbp, NGR234 carries a single plasmid of ca. 
500 kbp, pNGR234a. Moreover, it has been shown by transfer 
of pNGR234a into heterologous rhizobia, and even into non- 
35 nodulating AgroJbactei-iu;n tumefaciens , that most nodulation 
functions are encoded by this plasmid (Broughton et al., 
1984). The fact that NGR234 is able to interact 

symbiotically with more plants than any other known strain. 
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and that a complete ordered cosmid library of pNGR234a was 
available, reinforced NGR234 as the best choice for a large- 
scale sequencing effort on a symbiotic plasmid (Ferret et 
al., 1991; Freiberg et al., 1997). 

5 

Automated fluorescent methods have been used to 
sequence cosmids from eukaryotic organisms, including 
Saccharomyces cerevisiaB (Levy, 1994) , CaBnorhabdxtis 
Blegans (Sulston et al., 1992), Drosophila melanogaster 

10 (Hartl and Palazzolo, 1993) , and Homo sapiens (Bodmer, 
1994) , as well as chromosomes from the prokaryotes 
Haemophilus influenzae (Fleischmann et al., 1995) and 
Mycoplasma genitalium (Eraser et al. , 1995) . In most large- 
scale sequencing centres this technology is based mainly on 

15 the shotgun approach. After random fragmentation of DNA 
(e.g. cosmids, bacterial artificial chromosomes (BACs) , 
entire chromosomes) using sonication or mechanical forces, 
size-selected fragments are subcloned into M13 phages, 
phageroids or plasmids and sequenced by cycle sequencing 

20 using dye primers (Craxton, 1993) . A disadvantage of this 
method is that DNA regions with elevated GC contents produce 
large numbers of compressions (unresolvable foci in sequence 
gels) in the dye primer sequences leading to several hundred 
compressions per assembled cosmid sequence. It is known 

25 that the use of dye terminators - fluorescent ly labelled 
dideoxynucleoside triphosphates - instead of dye primers 
reduces the number of compressions (Rosenthal and Charnock- 
Jones, 1993) . Therefore, dye terminators are frequently 
being used for gap closure and proofreading after assembly 

30 of the shotgun data. 

To sequence GC-rich cosmids with the highest accuracy, 
the effectiveness of shotgun sequencing with dye terminators 
in comparison to dye primer sequencing was investigated. To 
35 improve the incorporation of dye terminators into DNA, a 
modified Taq DNA polymerase carrying a single mutation was 
used (Tabor and Richardson, 1995) . This enzyme has 
properties similar to a thermostable "sequenase" and is 
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commercially available as Thermo Sequenase (Amersham, 
Buckinghamshire, UK) or AmpliTaq FS (Perkin-Elmer , Foster 
City, CA, USA) . Concentrations of dye terminators needed in 
the cycle sequencing reactions can be reduced by 20 - 250 
5 times. It was found that dye terminator shotgun sequencing 
leads to compression-free raw data that can be assembled 
much faster than shotgun data mainly obtained by dye primer 
sequencing. This strategy thus allows a several-fold 
increase in speed to sequence individual cosmids. This was 

10 demonstrated by comparing assembly of the sequence data of 
two cosmids from pNGR234a generated by different 
chemistries: Cosmid pXB296 was sequenced with dye 
terminators, whereas data for pXBllO were obtained using the 
common dye primer method. Also disclosed is the analysis of 

15 the entire pXB296 sequence. 

Moreover, the dye terminator shotgun sequencing 
strategy used to generate the sequence data for pXB296 was 
also used to sequence all the other remaining overlapping 
20 cosmids of the plasmid pNGR234a. In summary, 20 cosmids 
have been sequenced together with two PCR products and a 
subcloned DNA fragment derived from a cosmid identified as 
pXB564 in order to generate the plasmid *s complete 
nucleotide sequence. 

25 

After its assembly, the analysis of the entire 
nucleotide sequence of pNGR234a, especially the 
determination of putative coding regions and the prediction 
of their expressible proteins and putative functions, was 

30 performed. Initially, analysis of the region covered by 
cosmid pXB296 was extended to cosmids pXB368 and pXBllO. 
Thus, in approximately 100 kb of the plasmid (position 
417,796 - 517,279) most ORFs and their deduced proteins with 
different putative functions were predicted. Subsequently, 

35 the rest of pNGR234a was analyzed. 
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SUMMARY OF THE INVENTION 



The present invention provides the complete nucleotide 
sequence of symbiotic plasmid pNGR234a or degenerate 
5 variants thereof of Ehizobium sp. NGR234. 

The present invention also contemplates sequence 
variants of the plasmid pNGR234a altered by mutation, 
deletion or insertion. 

10 

Also encompassed by the present invention are each of 
the ORFs derivable from the nucleotide sequence of pNGR234a 
or variants thereof* 

15 In a preferred embodiment, the ORFs derived from the 

nucleotide sequence of pNGR234a encode the functions of 
nitrogen fixation, nodulation, transportation, permeation, 
synthesis and modification of surface poly- or 
oligosaccharides , lipo-oligosaccharides or secreted 

20 oligosaccharide derivatives, secretion (of proteins or other 
biomolecules) , transcriptional regulation or DNA-binding, 
peptidolysis or proteolysis, transposition or integration, 
plasmid stability, plasmid replication or conjugal plasmid 
transfer, stress response (such as heat shock, cold shock or 

25 osmotic shock), chemotaxis, electron transfer, synthesis of 
isoprenoid compounds, synthesis of cell wall components, 
rhizopine metabolism, synthesis and utilization of amino 
acids, rhizopines, amino acid derivatives or other 
biomolecules, degradation of xenobiotic compounds, or encode 

30 proteins exhibiting similarities to proteins of amino acid 
metabolism or related ORFs, or enzymes (such as 
oxidoreductase, transferase, hydrolase, lyase, isomerase or 
ligase) . 

35 In another preferred embodiment, the ORFs are under 

the control of their natural regulatory elements or under 
the control of analogues to such natural regulatory 
elements. 
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The present invention also provides the sequences of 
the intergenic regions of pNGR234a which, in a preferred 
embodiment, are regulatory DNA sequences or repeated 
5 elements. In a further preferred embodiment, the intergenic 
sequences are ORF- fragments. 

Also provided by the present invention are mobile 
elements (insertion elements or mosaic elements) derivable 
10 from the nucleotide sequences of the present invention. 

The present invention also contemplates the use of the 
disclosed nucleotide sequences or ORFs in the analysis of 
genome structure, organisation or dynamics. 

15 

Also provided by the present invention is the use of 
the nucleotide sequences or ORFs in the subcloning of new 
nucleotide sequences. In a preferred embodiment, the new 
nucleotide sequences are coding sequences or non-coding 
20 sequences. 

In yet a further preferred embodiment, the nucleotide 
sequences or ORFs are used in genome analysis and subcloning 
methods as oligonucleotide primers or hybridization probes. 

25 

The present invention further provides proteins 
expressible from the disclosed nucleotide sequences or ORFs. 

Also contemplated by the present invention is the use 
30 of the disclosed nucleotide sequences, individual ORFs or 
groups of ORFs or the proteins expressible therefrom in the 
identification and classification of organisms and their 
genetic information, the identification and characterisation 
of nucleotide sequences, the identification and 
35 characterisation of amino acid sequences or proteins, the 
transportation of compounds to and from an organism which is 
host to said nucleotide sequences, ORFs or proteins, the 
degradation and/or metabolism of organic, inorganic, natural 
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or xenobiotic substances in a host organism, or the 
modification of the host-range, nitrogen fixation abilities, 
fitness or competitiveness of organisms. 

5 The present invention also provides plasmid pNGR234a 

of Rhizobium sp. NGR234 comprising the disclosed nucleotide 
sequence or any degenerate variant thereof. 

The present invention also provides a plasmid 
10 harbouring at least one of the disclosed ORFs or any 
degenerate variant thereof. 

The plasmids of the invention may be produced 
recombinantly and/or by mutation, deletion, insertion or 
15 inactivation of an ORF, ORFs or groups of ORFs. 

The present invention also provides the use of the 
disclosed plasmids or variants thereof in obtaining a 
synthetic minimal set of ORFs required for functional 

20 fAizoJbiuin- legume symbiosis, the modification of the host- 
range of rhizobia, the augmentation of the fitness or 
competitiveness of Rhizobium sp. NGR234 in the soil and its 
nodulation efficiency on host plants, the introduction of 
desired phenotypes into host plants using the disclosed 

25 plasmids as stable shuttle systems for foreign DNA encoding 
said desired phenotypes, or the direct transfer of the 
disclosed plasmids into rhizobia or other microorganisms 
without using other vectors for mobilization. 

30 The nucleotide sequences of the present invention were 

advantageously obtained using known cycle sequencing 
methods. The preferred dye terminator /thermostable 

sequenase shotgun sequencing method used to generate the 
nucleotide sequences of the present invention, when applied 

35 to cosmids and when compared to other sequencing methods, 
was shown to yield sequence reads of the highest fidelity. 
Consequently, the speed of assembly of particular cosmids 
was increased, and the resultant high-quality sequences 
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required little editing or proofreading. Thus, the 
preferred sequencing method described herein was 
successfully used to generate the complete nucleotide 
sequence of all the overlapping cosmids of plasmid pNGR234a, 
5 thereby resulting in the assembly of the complete sequence 
of the plasmid. 

The complete sequence of pNGR234a is disclosed for the 
first time in this application, as are the majority of the 
10 ORFs predicted within the sequence. Putative functions have 
been ascribed to the novel and inventive ORFs disclosed 
herein and the proteins for which they code. 



15 BRIEF DESCRIPTION OF DRAWINGS 



The present invention is described below and 
illustrated thereafter in the appended examples, with 
reference to the following figures: 

Figure 1 A comparative graph showing the comparison of 
sequences from pXB296 created by different cycle 
sequencing methods. 



25 Figure 2 A schematic diagram showing the organization of 
the predicted ORFs in pXB296 from Rhizobium 
sp. NGR234. 

Figure 3 The complete nucleotide sequence of plasmid 
30 pNGR234a (with the pages labelled sequentially 

from 19961 to 1996 142 ) . 



Figure 4 A schematic diagram showing the map of the 20 
sequenced cosmids covering the 536 kb symbiotic 
35 plasmid pNGR234a of Rhizobium sp. NGR234. 



Figure 5 



A diagram indicating multiple alignments of the 
nucleotide sequence of the replication origins 
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of various plasinids. 

Figure 6 A diagram indicating multiple DNA sequence 
alignments of the regions containing the origin 
5 of transfer of various plasmids. 

Figure 7 A schematic diagram showing a circular 
representation of the symbiotic plasmid pNGR234a 
of NGR234. 

10 

DETAILED DESCRIPTION OF THE INVENTION AND BEST MODE 

Comparison of Different Shotgun Sequencing Strategies 

15 

The following is a more detailed description of 
certain key aspects of the present invention. 

GC-rich cosmids were examined to investigate whether 
20 they could be sequenced much more efficiently using dye 
terminators throughout the shotgun phase instead of dye 
primers. As a test case, cosmid pXB296 with a GC content of 
58 mol% from pNGR234a, the symbiotic plasmid of Rhizobium 
sp. NGR234, was exclusively sequenced using dye terminators 
25 in combination with a thermostable sequenase [Thermo 
Sequenase (Amersham) ] • Another rhizobial cosmid with 
identical GC content, pXBllO, . was sequenced using 
traditional dye primer chemistry and Tag DNA polymerase. 

30 Using the dye terminator /thermostable sequenase 

shotgun strategy, it was shown that most, if not all, 
compressions could be resolved and reads were produced with 
the highest fidelity among all sequencing chemistries 
tested. As a result, a much faster assembly of cosmid 

35 pXB296 in comparison to pXBllO was obtained. The shotgun 
data could be assembled into a high-quality sequence without 
extensive editing and proofreading. By measuring the error 
rate in overlapping regions between individual cosmids from 
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pN6R234a, as well as the cosmid vector sequence itself (data 
not shown) , it was estimated that the accuracy of the pXB296 
sequence is higher than 99.98%. Using other thermostable 
sequenases such as AmpliTaq FS (Perkin-Elmer) , similar 
5 results were expected because thermostable sequenases have 
similar properties. 

Dye primer chemistry in combination with Thermo 
Sequenase was also examined. Although the peak uniformity 

10 of signals was much improved over dye primer/Tag DNA 
polymerase data, the number of compressions in GC-rich 
shotgun reads was not reduced significantly. Compressions 
in shotgun raw data enormously increase the overall effort 
of editing, proofreading, and finishing a cosmid as shown 

15 for pXBllO (Table 1) . 

Because of their longer reading potential, dye primer 
reads are helpful for gap closure. However, using ABI 373A 
sequencers (Applied Biosystems, Inc. (ABI), Perkin-Elmer, 

2 0 Foster City, CA, USA), dye primer reads are, on average, 

only -50 bases longer than dye terminator reads. 

Using the experimental conditions of the present 
invention, shotgun sequencing with dye terminators and a 
25 thermostable sequenase is superior because for GC-rich 
cosmid templates it removes most of the compressions and 
this leads to a several-fold improvement in assembling and 
finishing of cosmid-sized projects . Although dye 

terminators are slightly more expensive than dye primers, 

3 0 the overall saving in time for finishing projects has, in 

our experience, a much greater effect on general costs. 

It has been shown that the strategy of the present 
invention is effective for high-throughput shotgun 
35 sequencing of GC-rich templates. This strategy was 
therefore used to sequence the remaining 19 overlapping 
cosmids of the symbiotic plasmid pNGR234a of Rhlzobium sp. 
NGR234. In total, 20 cosmids, two PCR products (1.5 and 
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2.0 kb in length) and a 1.5 kb restriction fragment were 
sequenced in order to generate the complete pNGR234a 
sequence (Figure 4) . 

5 Genetic Organization of pXB296 

All 28 predicted open reading frames (ORFs) in pXB296 
(Figure 2) show significant homologies to database entries 
(Table 2) . The first putative gene cluster (cluster I) 

10 containing ORFl to ORFS corresponds to various oligopeptide 
permease operons (Hiles et al . , 1987; Perego et al . , 1990). 
Only ORFS shows homology to a gene from a different 
bacterium. Bacillus anthracis (Makino et al . , 1989). Each 
homologue encodes membrane-bound or membrane-associated 

15 proteins suggesting that all five ORFs are involved in 
oligopeptide permeation. 

Organization of the predicted gene cluster IV, 
including the nifA homologue 0RF16 {fixABCX, nifA^ nifB, 

20 fdxN, ORE, fixU homologues, position 16,746 - 24,731), the 
predicted locations of the a^^-dependent promoters and the 
nifA upstream activator sequences (Figure 2) , correspond to 
the organization found in Rhizobium meliloti and Rhizobium 
leguminosarum bv. trifolii. (lismaa et al., 1989; Fischer, 

25 1994). NifA is a positive transcriptional activator 
(Buikema et al . , 1985), whereas nif and fix genes are 
essential for symbiotic nitrogen fixation. , Identification 
of a^^-dependent promoter sequences, together with the 
upstream activator motifs upstream of ORF21, 0RF22, and 

30 ORF23, suggests that these ORFs may play an important, but 
still undefined, role in symbiosis. 

Inevitably, large-scale sequencing uncovers 
differences with already published sequences. van Slooten 
35 et al. (1992) cloned a 5.8 kb EcoRl fragment from Rhizobium 
sp. NGR234 and sequenced 2067 bp by manual radioactive 
methods (EMBL accession no. S38912) . This sequence exhibits 
2.4% mismatches with the corresponding sequence in pXB296. 
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It contains the gene dctA (encoding a C^-dicarboxylate 
permease), which is 144 bases shorter than in pXB296. In 
this respect, a single nucleotide deletion in position 
29,248 of the cosmid sequence close to the 3* end of the 
5 gene causes a frameshift leading to a DctA product extended 
by 48 residues. van Slooten et al. (1992) also failed to 
identify the nifQ homologue, ORF23 (position 27,169 - 
27,861), presumably because they overlooked a small XhoX 
fragment located between positions 27,349 and 27,536 on 

10 pXB296. Expression studies allowed these investigators to 
define a putative a^^-dependent promoter in a 1.7 kb Smal 
fragment (position 27,094 - 28,818 in pXB296) . This 
fragment stretches from the upstream region of ORF23 to the 
5' part of dctA. The 58 bp intergenic region between ORF23 

15 and dctA contains a stem-loop structure but no obvious 
promoter sequence. Possibly the promoter that controls dctA 
is located upstream of ORF23 (e.g. the minimal consensus 
sequence included in GGGGGCACAATTGC at position 27,098 - 
27,111). Although clones containing dctA complemented 

20 mutants of mBlilotl and leguminosarum for growth on 
dicarboxylates , the growth of the NGR234 dctA deletion 
mutant was not affected (van Slooten et al . , 1992). 
Nevertheless, this mutant was unable to fix nitrogen in 
nodules. Because dctA is now possibly part of a larger 

25 transcription unit, the symbiotic phenotype may also result 
from the inactivation of downstream genes. 

Interestingly, the GC content of the predicted pXB296 
ORFs ranges from 53.3 mol% to 64.6 mol%, with an overall 

30 cosmid GC content of 58.5 mol%. Genomes of Azorhlzobium, 
Bradyrhlzobium , and Rhlzobium species have GC contents of 59 
mol% to 65 mol% (Padmanabhan et al., 1990), with 62 mol% 
reported for Ehizobium sp. NGR234 (Broughton et al., 1972). 
Although pXB296 covers <7% of the complete symbiotic plasmid 

35 sequence, its lower overall GC value suggests that symbiotic 
genes might have evolved by lateral transfer from other 
organisms. In this case, methods of the type applied in the 
present invention will become even more relevant in 
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sequencing the whole genome. 

Genetic Organization of the 100 k2> region covered by cosmids 
PXB296, PXB368 and pXBXlO 

5 

Extending the analysis of pXB296 to a 100 kb region 
stretching from position 417,796 to 517,279 on the symbiotic 
plasmid pNGR234a led initially to the assignation of only 76 
ORFs listed within Table 3 (excluding the first incomplete 

10 ORF noted in the analysis of pXB296 ("ORFl" of Table 2)). 
The ORFs y4tQ to y4vJ (excluding ORFs y4uD and y4uG and 
excluding ORF-f ragments ful, fu2, fu3, fu4 and fvl; see 
Table 3) are identical to the ORFs 2 to 28 of the analysis 
of pXB296 in Table 2 apart from minor revisions (N.B. the 

15 analysis recited in Table 3 should be taken as the 
definitive analysis - Table 2 merely represents preliminary 
findings) . The cosmid pXBllO, which was sequenced with the 
dye primer shotgun sequencing strategy in order to compare 
it with the dye terminator shotgun sequencing strategy used 

20 to sequence cosmid pXB296, in combination with pXB296 and 
pXB368 cover nearly this entire region. A PGR product and 
a restriction fragment of cosmid pXB564 also had to be 
sequenced in order to fill in the gap from position 480,607 
to 483,991 between cosmids pXB368 and pXBllO (Figure 4). 

25 Among the 76 predicted ORFs, 7 ORFs and their deduced 
proteins show no homologies to database entries. The other 
predicted ORFs and their deduced proteins do exhibit such 
homologies and therefore play putative roles in nitrogen 
fixation (ORFs y4uJ to y4vB, y4vE, y4vN to y4vR, y4wK and 

30 y4wL) , nodulation (ORFs y4yC and y4yH) , transportation (ORFs 
y4tQ to y4tiA, y4vF and y4wM) , secretion of proteins or other 
biomolecules (ORFs y4yl and y4yO) , transcriptional 
regulation/ DNA binding (ORFs y4wC and y4xl) , in amino acid 
metabolism or metabolism of amino acid derivatives (ORFs 

35 y4uB, y4uC, y4uF, y4wD, y4wE and y4xN to y4yA) , degradation 
of xenobiotic compounds (ORFs y4vG to y4vl) , in 
peptidolysis/proteolysis (ORFs y4wA and y4wB) or 
transposition (ORFs y4uE, y4uH and y4ul) (see Table 3) . The 
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prob. conjugal transfer protein (relaxase) 


prob. conjugal transfer protein 


prob. conjugal transfer protein 


prob. conjugal transfer protein 


fragments hom. to ORFL! (conjugal transfer regionl); 
frameshifts: 83072 (1>3). 83161 (3>2) 


1 hypothetical 22.9 kd protein 


hypothetical 20.6 kd protein 


hypothetical 9,9 kd protein 


hypothetical 1 1.6 kd protein 


put. fragments; homology to mercuric reductase, put. 
frameshifts: 86592 (-l<-3), 87288 (-3<-2) 


hyp. 34.2 kd protein; hom. to 5'end. of traC-1 from 
plasmid RP4 


put. phosphodiesterase; low homology to 
glycerophosphoryl-diester-phosphodiesterase 


hyp. 38,5 kd protein 


fragments of put. transposase; put. frameshift: 93798 
(2<3) 


put. integrase/recombinase ("phage-type"); similar to 
Y4rF (35% aa-id,); low similarity to Y4rABCDE 


put. defective integrase/recombinase 


fragments hom. to integrase; put. frameshift: 95559- 
95671 (-2<-l) 


nodulation protein; hyp. acetyl transferase 


hyp. 11.1 kd protein with transmembrane domain 


hyp. 10.3 kd protein fragment, hom. to C-terminal part of 
bacterial aminotransferases 


hyp. short chain type dehydrogenase/reductase 


hyp. short chain type dehydrogenase/reductase 


put. fragment; put. frameshifts: 100721 (1>2). 101728 
(2>1) 


put. truncated transposase-like protein; similar to Y4pO 


hyp. 11,5 kd protein 


hyp. 24.5 kd protein 


prob. methyl-accepting chemotaxis protein 
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1 hyp. 73.7 kd protein 


hyp. (fragmentous?) monooxygenase; extended homology 
to DszA infr,2: 110372 to 110506. 


1 hyp. 24.6 kd integral membrane protein 


1 hyp. 17.2 kd protein precurser 


1 hyp. 19.5 kd protein 


1 hyp. 14.5 kd protein 


1 hyp. 11.6kd protein 


hyp. protein fragment, similar to central region of 
oligo/di-peptide ABC transporter ATP-binding proteins 


1 put. outer membrane protein (porin) precurser 


1 put. transcriptional regulator (AraC family) 


hyp, 29.1 kd integral membrane protein, belongs to the 
inositol monophosphatase family 


1 hyp. 35.5 kd protein 


prob. ABC transporter permease protein; put. part of 
binding-protein-dependent transport system Y4fNOP 


1 prob. ABC transporter ATP-binding protein 


prob. ABC transporter periplasmic binding protein 
precurser 


hyp. 41.6 kd protein; belongs to "ROK" family 
(transcriptional regulator or transferase) 


1 hyp. 60.5 kd protein, hom. to invasion plasmid antigen H 


1 hyp. 20.9 kd protein; low similarity to Y4rE 


1 hyp. 16.1 kd protein 


1 put. integrase/recombinase ("phage-type) 


\ hyp. 10.5 kd protein 


hyp. (fragmentous?) 27,7 kd protein; 

put. frameshifts: 131532 (2>1), 131892 (1>2) 


prob. dTDP-D-glucose-4,6-dehydratase (Y4gFGH inv. in 
dTDP-L-rhamnose biosynthesis) 
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biosynthesis) 
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prob. dTDP-4-dehydrorhamnose-3,5-epimerase (inv. in 
dTDP-L-rhamnose biosynthesis) 


1 prob. ABC transporter ATP-bindinfi protein 


1 hyp. 45 kd protein 


1 put. ionic transporter 


nodulation protein (put, sulfate transferase) 


1 nodulation protein (unknown function) 


inv, in 0-carbamoylation of Nod factors (sim. to NodU) 


prob. ABC U^ansporter permease (see nodi) 


prob. ABC transporter ATP-binding transport protein; 
put. role: together with NodJ export of modified beta- 1 .4- 
N-glucosamine oligosaccharides 


N-acetylelucosaminyltransferase 


chitooligosaccharide deacytelase 


N-acyltransferase; nodABC involved in synthesis of 
backbone of modified N-acylated glucosamine 
oligosaccharides 


hom. to part of coproporphyrinogen III oxidase (lacks C- 
terminus and conserved N-term. domain) 


hyp. 25.4 kd integral membrane protein 


1 hyp. 9.6 kd protein 


hyp. 43.9 kd protein (partially hom. to glucose-fructose 
oxidoreductase) 


hyp. 16 kd protein; partially hom. to Y4iB and Y4K} 


hyp. 12.8 kd protein 


hyp. 61.7 kd protein; similar to Y4aQ, Y4jD and Y4ql 


hyp. 21.7 kd protein 


hyp. 8.8 kd protein 


hyp. transposase fragment similar to R. meliloti 
ISRm2011-2 


put. defective transposase (homologous to N-tcrminal 
parts ofY4iO andY4rJ) 
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put. defective transposase (horn, to C-terminal parts of 
Y4iO and Y4rJ); additionally weak homology to 
Y4pF/Y4sB and Y4qE (<30% identity) 


hyp. protein (homolog located in a polysaccharide 
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hyp. 10.5 kd (fragmentous?) protein 


hvD. 26 kd protein orecurser 


1 hVD. 76.2 kd integral memhran^ nrntpin 


hyp. 65.5 kd orotein: low similaritv tn Y4iM 


hyp. 26.8 kd protein; y4iKL: two fragments of one gene?; 
put. frameshift: 181884 r-3<-2^ 


hyp. 47.8 kd protein; y4iKL two fragments of one gene?; 
put frameshift: 181884 r.3<.2^ 


hyp. 47.1 kd protein; low similarity to Y4iJ; y4iMN two 

fragments of one gene?; 

put. frameshift: 184440 r-2<-3^ 


hyp. 22,1 kd protein precurser; y4iMN two fragments of 
one gene?: put. frameshift: 184440 f-2<-31 


put. transposase or transposase-fragment; additionally 
weak homology to Y4pF/Y4sB and Y4qE (<30% identity) 


hyp. 14.4 kd protein or fragment horn, to N-term. of Y4rJ 


identical to Y4nD/Y4sD; put. insertion sequence ATP- 
binding protein; similarity to Y4bMAy4kI/Y4tA, Y4uH 
and weakly to Y4dL 


identical to y4nE/y4sE; hyp. 57.2 kd protein with low 
similarity to IS2I/IS408/IS1 162 transoosases 


hyp. 16.7 kd protein; partially similarity to Y4hN; low 
similarity to Y4rG 
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hyp. 13.1 kd protein; see y4hO 


hyp, 56.7 kd protein; see y4hP 


hypothetical (fragmentous?) 29.4 kd integral membrane 
protein; put. frameshift: 192996 (l>2; end of shifted ORF 
at 193183) ' 


hyp. 55.4 kd inteeral membrane orotcin 


1 hyp. 17.9 kd transmembrane orotein 


hyp. 23 kd protein 


1 hyp. 13.6 kd protein 


1 put. plasmid stability protein 


1 put. plasmid stabilitv protein 


1 hyp. 25.1 kd protein 


hyp. 8 kd protein or protein fragment 


hyp. 16.3 kd protein 


hyp. 36.1 kd protein; y4jOP: two fragments of one gene?, 
put. frameshift: 202550 (-3<-l) 


hyp. 29.5 kd protein; y4jOP: two fragments of one gene?, 
put. frameshift: 202550 {.3<-l) 


i hyp. 1 15.9 kd protein 


hyp. 17.3 kd protein 


hyp. 44.8 kd protein 


hyp. 36.4 kd protein precurser 


hyp. 36.7 kd protein 


hyp. 15.2 kd integral membrane protein 


hyp. fragment; sim. to Y4hP. Y4jD and Y4ql; additional 
homology to 0RFI4 in fr. -|.3/+2: 212331-212509 


hyp. 60.4 kd protein 


hyp. 38 kd protein; y4kEF: two fragmenu of one gene?, 
put. ftameshift: 215616 (-l<-2) 


hyp. 47,4 kd protein; y4kEF: two fragments of one 
gene?, put. frameshift: 215616 M<-2) 


hyp. 7.7 kd protein 


hyp. 14.1 kd protein 


see y4bM 
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see y4bL 


hyp. 34.9 kd protein 


hyp. 37.6 kd AAA-faniily ATPase protein 


hyp. 13.1 kd protein 


hyp. 15.7 kd protein 


hyp. 9.2 kd protein 


hyp. 1 1 kd protein 


hyp, (fragmentous?) 15.3 kd protein; homology to hipO 
fragments on the complementary strand 


fragments hom. to HipO 


hyp. 4.8 kd (fragmentous?) protein (smallest ORF 
predicted to be a protein); hom. to N-term. of protein in 
crtE'CrtX intergenic region 


hyp. 33,2 kd protein 


hyp. 55.1 kd protein 


1 

1 

2 
a 


cytochrome P-450 BJ-4 homolog 


short-chain type dehydrogenase/reductase 


put. P450-system 3Fc-3S ferredoxin 


cytochrome P-450 BJ-3 homolog 


cytochrome P-450 BJ-1 homolog 


hyp. 7.6 kd protein fragment, homology to 0RF8 
fragments also upstream of fl3 up to 236048 


homology to hupK/hupJ fragments (ft. -3/-2) 


hyp. 36.1 kd protein 


hyp. 17.4 kd protein 


hyp. 22.4 kd protein; hom. to cell filamentation/division 
protein 


hyp. 7.3 kd protein 


hyp. 18.1 kd protein 


fragments of transposase (ISRm4) 


hyp. 14.3 kd protein 


•21 
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CJ 

3s 
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01 ^ 


put. truncated transposase; hom. to N-term. of TnpA 
(transposon Tnl63); strong similarity to C-terminus of 
F15 


hyp. 18.1 kd protein 
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hyp. 29 A kd protein; hom. to avirulence protein; put. 
frameshift according to homolog: 247230-247293 (-2<-3); 
end of shifted frame: 246960 


1 hyp. protein fragment: strong similarity to part of F14 


put. fragmentous transposasc; homologous to C-term. of 
transposase(Tnl546) 


1 hyp. 56.8 kd protein 


put. integrase/recombinase ("resolvase-typc") 


1 hyp. 15.8 kd protein 


1 fragments hom. to xylitol-dehydroeenase 


1 hyp. 24.6 kd outer membrane protein orccuner 


1 hyp. 26.2 kd protein precurser 


1 hyp. 10 kd protein 


hyp. 45.7 kd protein 


hyp. transcriptional regulator; very low similarity to 
phage repressor proteins 


1 hyp. 7.8 kd protein 


1 hyp. 33.9 kd protein 


prob. ABC transporter periplasmic binding protein 
precurser (u^ansport system Y4nfiIJK probably transports a 
sugar) 


prob. ABC transporter permease 


prob. ABC transporter ATP-binding protein 


1 put. permease (£. coli YiaN/YgiK family) 


put. permease (SBR family 7) 


hyp. transketolase family protein (fragmentous?); hom. to 
C-term. of transketoiases 


hyp. transketolase family protein (fragmentous?); hom. to 
N-term. of transketoiases 


1 

5 

1 
1 

1 
1 
£ 

t 


hyp. transcriptional regulator (LysR family) 


prob. peptidase; very low similarity to Y4qF and Y4sO 
«25% identity) 


inv. in 6-0-carbatnoylation of Nod factors; similar to 
Y4hD 


methyltransferasc inv. in Nod-factor synthesis 
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1 seeY4iO 


seeY4iA 


1 horn, to virG fragments: similar to fa3 


hyp. 25.4 kd protein precursor; low similarity to Y4aO 
(<30% id.) 


fragments horn, to 0RF2 (IS-ATP-binding protein) from 
IS1162 


1 put. NAD-dep. nucleotide suear eDimerase/dehvdraeenase 


hyp. 12.3 kd integral membrane protein (some similarity 
to ethidium bromide resistance proteins) 


1 hyp. 13 kd transmembrane protein 


hyp. GMC-type oxidoreductase 


hyp. integral membrane protein 


\ put. NAD-dep. nucleotide sugar eDimerase/dehvdrocenase 


t 

1 

CL 


hyp. 65.2 kd protein; homolog inv. in production of the 
translation inhibitor microcin C7 


1 hyp. 14.7 kd protein 


hyp. 26 kd protein 


1 hyp. 23.5 kd protein 


hyp. 7.4 kd protein 


fol and fo2: two fragments of one put, gene; put. 
frameshift: 299664 (-2<-3) 


homology to 5'part of ORFl 1; 

fol and fo2: two fragments of one putative gene; put. 

frameshift: 299664 (-2<-3) 


fo3 and fo7: transposase-like protein interrupted by 
NGRIS-6 


hyp. fragment; fo4/5/6: fragments of one gene similar to 
Y4bAA^4pH 


hyp. fragment; fo4/5/6: fragments of one gene 


hyp. fragment; fo4/5/6: fragments of one gene 


hyp. 9.6 kd protein 


hyp. 16.8 kd protein 


hyp. 8.1 kd protein 


fo3 and fo7: transposase-like protein interrupted by 
NGRIS-6 
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prob. ABC transporter binding protein (Y4oPQRS: sugar- 
like transport system) 


1 prob. ABC transporter permease protein 


1 prob. ABC transporter pennease protein 


1 prob. ABC transporter ATP-binding protein 


hyp. 20.6 kd protein; homologous to N-terminus of 
Y4pA. and weakly to Y4oV 


1 hyp. 43.1 kd protein precurser 


hyp. 30.2 kd protein; homologous to N-terminus of 
Y4pA, and weakly to Y4oT 


1 hyp. 23.7 kd protein 


prob. NAD-dep. oxidoreductase 


put. transcriptional regulator (sigma54-dep.) 


prob. trehalose-6-phosphate phosphatase 


prob. trehalose-6-phosphate synthase; similar to fq 1/2 


fragments homologous to 0RF3; put. frameshift acc. to 
homolofiue: 319122 (3>1) 


fragment homologous to ORFl from IS 1248 (fr. 3); 
similar to fs4 


put. transcriptional regulator (MucR family); missing Zn 
fmger motif; similar to Y4aP 


identical to y4sA; hyp. 15.5 kd protein horn, to N-term. 
of RFRS9 25kDa protein 


identical to y4sB; put. transposase; low similarity to 
Y4qE, Y4iB and Y4iO (<30% aa-id.) 


1 identical to y4sC; hyp. 21.1 kd protein 


"ORF" homologous to ORFl of IS 1 162 interrupted by 
stopcodon (323444) 


seey4bA 


1 
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see y4bC 


seey4bD 


fragment homologous to put. IS-ATP-binding protein 
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put. insertion sequence ATP-binding protein; similarity to 
Y4bMnr4kI/Y4tA. Y4uH, and weakly to 
Y4iQmnD/Y4sD (<30 aa-id.) 


1 hyp. 30.9 kd orotein 


put. frameshift: 331032 (2<\) 


probable symbiotic regulator (LysR family) 


1 prob. transposase (Mutator familv^: similaritv to fe7 


join fql+fq2: horn, to trehalose-6-phosphate synthase 
interrupted by ISRm3-like element NGRIS-8; similarity 
toY4pC(45%aa-id.) 


cs 


virG homologous fragments: stop at 37380; put. 
frameshift at 337844 (3>2): similar to fnl 


hyp. 18.8 kd protein 


hyp. 63.6 kd protein 


1 hyp. 26.8 kd orotein. similar to N-tcrminus of Y4rO 


prob. transposase; low similarity to Y4pF/Y4sB, Y4iB, 
Y4iO and Y4rJ (<30% aa-id.) 


fragments homologous to XerC (integrase) 


prob. peptidase (S9A family); high similarity to Y4sO; 
low similarity to Y4nA (<25% id.) 


prob. aminotransferase (class 3) 


hyp. transcriptional regulator (LuxR family) 


hyp. 59.7 kd protein: similar to Y4aO. Y4hP. Y4iD 


fragments fq5 and ft3 represent one put. gene similar to 
Y4hO and Y4iC interrupted by IS elements 


put. transposase 


put. integrase/recombinase ("phage-type"); similar to 
Y4rF; low similarity to Y4rABCDE 


put. defective integrase/recombinase ("phage-type"); 75% 
nt-identity: 356436-356710 and 94988-95262 rR-201 


put. integrase/recombinase ("phaee-tvoe") 


put, integrase/recombinase ("phage-type") 
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put. integrase/recombinase ("phage-type") 
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put. integrase/recombinase C'phage-type"); low similarity 
to Y4fiA 
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hyp. 14.8 kd protein (IS866 family); low similarity to 
Y4iB, Y4hN 


put. li&ase: hom. to biotin carboxvtases 


85% aa-identitv to part of Y4rL 


i put. frameshift: 367296 (-2<-n 


horn, to N-term. of Y4hO: see fa5 


1 hyp. 44 kd orotein 


put. transposase; low similarity to Y4qE (<30% aa-id.) 


hyp. 14.5 kd orotein 


hyp. 17.7 kd protein; y4rLM: two fragments of one 
gene?; put. frameshift: 371972 (-2<-3); 85-99% aa- 
identitv to parts of Y4zA and frl 


hyp. 39.4 kd protein: sec v4rL 


hyp. 41.6 kd intecral membrane orotein 


hyp. 69.3 kd protein; N-tcrminus: hom. to Y4qD; C- 
terminus: hom. to C-terminus of histidinol-l -phosphate 
transaminase 


Sim. to Y4rG; put. frameshift: 377376 (1>3); hom. to 
fragment of 0RFA3 (377409 - 377540) 


see y4pE 


see y4pF 


seey4pG 


see y4iO 


see y4iA 


put. defective transposase; sim. to fsl 
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prob. aminoacid ABC transporter binding protein 
(periplasmic); prob. part of binding-protein-dep. transport 
system Y4tEFGH 


inoacid ABC transporter permease protein 


inoacid ABC transporter permease protein 


inoacid ABC transporter ATP-binding protein 


idase (M40 family) 


put. threonine dehydratase 


hyp. cyclodeaminase; (sim. to omidiine cyclodeaminase) 


hyp. hydrolase/peptidase (M24 family) 


put. hydrolase/peptidase (M24 family) 


hyp. 19.6 kd protein 


prob. peptide ABC transporter binding protein precurser; 
prob. part of a binding-protein-dependent transport system 
Y4tOPQRS 


prob. peptide ABC transporter permease protein 


prob. peptide ABC transporter permease protein; 
41861 1:C or T possible! 


prob. peptide ABC transporter ATP-binding protein 


prob. peptide ABC transporter ATP-binding protein 


put. cell wall compound biosynthesis protein; almost 
identical to Y4sH 


prob. aminotransferase (class 3) 


1 prob. aldehyde dehydrogenase 


1 put. protein frasment; 67% id. to N15K in 26 aa 


1 fraement 65% identical to C-term. of beta-keto-thiolase 


1 hyp. 18.7 kd protein 


put. transposase (ISI 10 family); put. frameshift: between 
427040 and 427180 (-2<-3: end of shifted ORF: 426699) 


1 prob. glulamate dehydrogenase 


put. transposase fragment (92% id. in 16 aa); 85% ni- 
identity to 3*term. part of ISRm5 


1 hyp. 7.8 kd protein 


prob. ami 
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prob. ami 


put. pepti 
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put. insertion sequence ATP-binding protein; similarity to 
Y4pL, Y4bMA^4kW4tA and Y4iQ/Y4nDnr4sD 
(IS21/IS1 162 family) 


put. transposase; similarity to Y4bL/Y4kJnr4tB 
aS21/ISl 162 family) 


put. transposase fragments (74-92% id. in 88 aa); 79% nt- 
identity to 5*tenn. of ISRm4 


hyp. 8.5 kd protein 


put. nitrogen fixation NifZ protein 


prob. 4Fe-4S ferredoxin 


involved in FeMo cofactor biosynthesis 


positive regulator of nif.fix, and additional genes 
(siKma54-dep.) 


prob. 3Fe-3S ferredoxin inv. in nitrogen fixation 


required for nitrogenase activity 


putatively inv. in a redox process in nitrogen fixaUon 


putatively inv. in a redox process in nitrogen fixation 


put. NifS fragment (70% identity in 24 aa) 


hyp. 1 1 kd protein (HesB/YadR/YfhF family); 
homologues located upstream of nifS 


put. redox enzyme (horn, to glutaredoxin-like membrane 
protein and peroxysomal membrane proteins) 


putatively involved in Mo cofactor processing 


C4-dicarboxylate transport protein; nt-deleuon at 446416 
in comparison to sequence of acc. no. S38912 causing a 
frameshift (DctAl is 48 aa longer than DctAl in S38912) 


prob. CYtochromeP450 


hyp, 24.6 kd protein (with very weak homology to 
camma-hexachlorocyclohcxane-dechlorinasc) 


short-chain type dehydrogenase/reductase 

nut mrinnrkYvapnflCf>* cilTlilAr tCk V4wP*. 
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1 involved in FeMo cofactor biosynthesis 


1 nitrogen fixation protein 


hyp. 17.7 kd protein, similar to proteins of other 
nitrogen-fixing bacteria and to Y4xD 


1 similar to N-term. of Fc protein of nitroccnasc 


1 prob. 4Fe-4S femedoxin 


1 hyp. zinc protease (M16 family): sim. to Y4wB : 


put. protease (lacks Zn-binding site; M16 family); sim. to 
Y4wA 


put. DNA-binding protein; high similarity to Y4aM 


permease-type protein; horn, to membrane protein from 
the rhizopine biosynthesis (mosABC) ccne cluster 


1 prob. aminotransferase (class 2) 


1 put. monooxycenase; sim. to Y4vJ 


hyp. 19.4 kd protein 


1 hyp. 15.6 kd protein 


hyp. 59 kd protein 


hyp. 13.3 kd protein 


NifW protein homolog; required for full activity of FeMo 
protein 


prob. NifS protein (member of class-5 pyridoxal- 
phosphate-dep. aminotransferase family) 


put. ABC transporter binding protein (transporter or 
enzymatic function) 


hyp. truncated transporter-like protein; horn, to N-term. of 
DctA (sec y4vF); two frameshifts acc. to homologue: 
481606 (-3<-l); 481530 (-2<-3; homology stops at 
481419) 


hyp. 1 1 kd protein 


hyp. 14.9 kd protein 


Fe protein of nitroccnasc 


alpha-subunii of MoFe protein of nitrogenase 


beta-subunit of MoFe protein of nitrogenase 


hyp. 1 8 kd protein; similar to proteins of other nitrogen- 
fixing bacteria and to Y4vO 


hyp. 7.6 kd protein; similar to proteins of other niuogen- 
fixing bacteria 
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1 hyp. 6.5 kd protein 


put. exopolysaccharide production repressor (intrgal 
membrane protein) 


Ihyp. 55.5 kd protein 


transcriptional regulator (LysR family); high similarity to 
Y4aL (NodDl) 


1 sicnal transduction-tvoe regulator 


hyp. protein horn, to proteins of the general secretion 
pathway (pulD family), sim. to Y4vD (NolW) 


1 hyp. 20.6 kd protein precurser 


1 hyp. 37.1 kd protein 


permeasc-type protein 


1 hyp. 71 kd protein horn, to aerobactin synthetase subunit 


1 hyp. 40.9 kd protein 


put. cysteine synthase 


hyp. 49.9 kd protein; low similarity to diaminopimelate 
decarboxylase 


hyp. 17.1 kd protein 


nodulation protein as in R. fredii USDA257 


nodulation protein (PulD familv): sim. to Y4xJ 


nodulation protein 


nodulation protein precursor (YscJ homoloe: M7401 1) 


nodulation protein 


homologous to two (nodulation) proteins of R. fredii 
USDA257 (YscL homolog; M7401 1 ) 


prob. ATPase involved in secretion 


hyp. 20.4 kd protein 


prob. translocation protein inv. in secretion processes 
(FliN/MopA/SpaO family) 


prob. translocation protein inv. in secretion processes 
(FliP/MopC/SpaP family) 


prob. translocation protein inv. in secretion processes 
(FliQ/MopD/SpaQ family) 


prob. translocation protein inv. in secretion processes 
(FliR/MopE/SpaR family) 




»n 




ON OO 


OO 


VO 






Ov 
Tl- 


cn 




s 






ON 
ON 


8 


0\ 


r- 

ON 


ON 
ON 


is; 

^ ON 


m OO 
ON ON 


OO 
Ov 


VO OO 
ON 


VO ON 

NO Ov 


m S 


CM ON 
<0 Ov 




en 




On 00 

ON VO 


o\ 


«s 






<N 


OO 

cs 




o 






00 
Ov 


Ov 

a\ 


00 


NO 

Ov 


Ov 
Ov 


8 VO 

— ON 


»o 

U-l Ov ON 


Ov 


NO 

CM ON 


NO 0\ 
ON 


TT OO 

cn Ov 


— OO 

m ON 




M61751 




L38460 
this work 


1 L13395 


J02451 






X59939 


1 X76100 




1 D26185 1 






1 L12251 1 


1 L12251 1 


1 L12251 1 


1 L12251 1 


1 L12251 1 


L12251 


U00998 
L12251 


L12251 1 


L25667 
L12251 


L25667 
L12251 


L25667 
L12251 


L25667 
L1225I 




OO 

o\ 




CN CM 
— <M 


<N 
<M 
tN 


NO 

5! 






OO 


o 

OO 




OO 

o 






NO 
ON 

<o 


m 


s 


Ov 
00 
CM 


c3 


in 
VO — 


0\ o 
cn vn 
rl- Tj- 


00 

r- 


CM 

O OO 


ON 

— Tt 

<M CN 


OO CM 
OO Ov 


NO 

«S CM 




ExoX 




NodD2 
NodDl 


IPmrA 


GPIV 






ii 


1 lucC 1 




1 CysK 1 






NolX 1 


NolW 1 


NolB 1 


NolT 1 


NolU 1 


0RF4 
NolV 


YscN 
HfcN 


0RF7 1 


YscQ 
HrcO 


YscR 
HrcR 


YscS 
HrcS 


YscT 
HrcT 




14-83 




1-312 
1-310 


i 1-224 


76-378 






23-403 


1 183-505 1 




5-304 1 






1-596 1 


1-234 1 


1-164 1 


1-289 1 


1-212 1 


1-60 
73-208 


35-450 

1-80 

105-450 


1-178 1 


171-350 
1-358 


6-216 
1-222 


o — 

VO Ov 


28-250 
1-272 


OO 


s 


*n 
o 


cn 


VO 


m 


OO 
OO 


00 

m 
m 


1 


OO 
VO 


On 
cn 


NO 
CO 

m 




JO 


NO 

0\ 

*o 


<s 




ON 


CM 

c5 


OO 


lo 


OO 


OO 

»o 
en 


<N 


Ov 


CM 
CM 


1 488973-489149 


489281 -489583 


490010-491527 


491655 - 492593 


494297-494977 


495157 - 496428 


1 496438 - 497004 | 


o 

ON 

t 

i 

On 


498719 - 499933 


1499930-501816 I 


501816 - 502955 


502952 - 503962 I 


503963-505336 


505336-505800 1 


505950 - 507740 1 


508021 -508725 1 


508881 -509375 I 


509385 - 510254 | 


510251 -510889 I 


510891 -511517 


511514-512869 


512845-513381 1 


513406-514482 


514475-515143 


515143-515418 


515427-516245 


+ 


+ 


+ 




CM 
+ 


+ 


+ 


1 


t 






1 




cn 


CJ* 




m 
+ 


m 
+ 


CM 
-1- 


+ 


+ 




+ 


+ 


+ 


m 
+ 








1 






















>< 

s: 




a 


\nolT \ 






-« 








hrcS 


hrcT 


X 


X. 

>> 


a 

X 

>^ 


X 

X 

>* 


"x 




X 

>1 


nJ 
X 

>1 


X 


X 

>^ 


O 

X 


cu 

X 


< 

>^ 


PQ 

1 


■5, 


Q 

>^ 

■5; 


m 


>- 


a 

t; 


X 

>> 


*>» 






>> 


IS, 


Z 



3<? 



.5 *S 
5> CO 
S 1 
a. CO 



<8 

i6 



O.Oi 



o o\ 
en as 



VO IT) 
\0 CM 
>n CM 



C*^ CO 



5=5 



Ox 
CM 



so 



+ 



^5 



o 



B o 



£ i 
2 2 



8 OS 
— • 00 

C 4> 
(U 3 

s g> 

5 o 
^ B 

(O O 

S "= ^ 

1 §s 

^ I— • 



sis. 

o ^ 



1 



a: 



5i 

3? 

s 

en 

fl 

o 

«> m 

(t- OS 

On 
. vo 

>v<M 

jC in 



8 



CO ^ 
^ CM 

II 

I E 

g .2P 



00 ^ 
o\ o\ ^ 



^85 



11 



in m 
I 

rj- en 



>1 



19 



i 



9 



• fin 



s 



go 
S ^ 



11 

I 

CO 



a 



— OS 

DO 



0> CM 
SO >0 



OS O 



so oo 
CM m 



CO co^ 



o m 



« a c3 



oo 
OS 



I 



8 

CO 




s 



4. 



40 



role of some ORFs like the lucif erase-like ORFs (y4vJ and 
y4wF; see Table 3) in rhizobia is still not clear. In the 
100 kb region, the duplication of a 5 kb sequence (position 
451,886 to 456,157 and 483,764 to 488,035) including the 
5 genes nifHDK is remarkable. These genes encode the basic 
subunits of the nitrogenase. Furthermore, the 

transcriptional regulator nodD2 is very interesting because 
its role seems not to be identical to a previously 
identified nodD2 in a closely related strain (Appelbaum et 

10 al., 1988; data not shown). Also the pmrA-homologous ORF 
y4xl putatively plays an important role in regulating 
symbiotic processes because a nod box (binding region for 
the basic regulator nodDl; Fisher and Long, 1993) is located 
upstream of this ORF (position 493,962 to 494,000). 

15 Finally, the presence of ORFs (y4yl and y4yK to y4yN; see 
Table 3) homologous to type III secretion proteins, which 
have only been known previously in plant or animal/human 
pathogenic bacteria, shows that there only seems to be a 
subtle difference between symbiotic and pathogenic abilities 

20 of microorganisms. 

In a second stage, the remaining 436 kb of pNGR234a 
were analyzed. Several ORFs and their deduced proteins were 
identified that belong to functional groups not previously 

25 identified in the analysis of cosmids pXB296, pXB3 68 and 
pXBllO (replication of the plasmid, conjugal transfer of the 
plasmid, functions in oligosaccharide biosynthesis and 
cleavage, functions in sugar or sugar-derivative metabolism, 
functions in lipid or lip id-derivative metabolism, functions 

30 in chemoperception/chemotaxis, functions in biosynthesis of 
cof actors, prosthetic groups and carriers, etc.). 

Although further functional analyses of selected ORFs 
in pNGR234a still have to be performed, large-scale 
35 sequencing gives a global picture of their genomic 
organization and possible roles. Determination of putative 
functions of predicted genes by homology searches and 
identification of sequence motifs (promoters, nod boxes. 
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nifA activator sequences, and other regulatory elements) 
will aid in finding new symbiotic genes* High-fidelity 
sequence data covering long stretches of the genome are a 
prerequisite for these studies. The use of the dye 
5 tearminator/ thermostable sequenase shotgun approach has 
allowed the completion of the entire -500 kb sequence of 
pNGR234a and has opened up new avenues for the genetic 
analysis of symbiotic function. 



Genetic organization of the Whole Plasmid pNGR23 4a 

Within the complete nucleotide sequence of pNGR234a, which 

comprises 536,165 bp, a total of 416 ORFs were predicted to encode 

proteins. An additional 67 ORF-f ragments were detected that seem to 

be remnants of functional ORFs. 

Thirty four percent (139) of the 416 potential proteins, have no obvious similarities to 

any known proteins. Of the remaining 277 proteins, 31 (8%) are similar to proteins for which 

no biochemical or phenotypic role has been assigned, 12 (3%) are similar to proteins for which 

limited biological data is available, and 234 (56%) are similar to proteins with a more precise 

biological function: enzymes (95), proteins involved in integration and recombination of 

insertion elements (44), transporters (32), transcriptional regulators (22), protein 

secretion/export (21), proteins involved in replication and control of the plasmid (12), electron 

transporters (6), and proteins involved in chemotaxis (2). A high proportion of enzymes was 

expected of a symbiotic replicon involved in nodulation (Nod-factor biosynthesis, etc.) and 

nitrogen fixation. As expected from the observation that NGR234 can be cured of its plasmid 

(Morrison et aL, 1983), no ORFs essential to transcription, translation or to primary 

metabolism were found. 

A large number of protein families are present in several copies on pNGR234a, This is 
true even after elimination of the many proteins which are encoded in repeated IS elements, or 
are involved in transposition, integration or recombination. The most notable examples of 
highly represented protein families include: five members of the short-chain 
dehydrogenase/reductase family, one of which (y4vl) contains two homologous domains; five 
complete and one partial ABC-type transporter operons that each encode for at least one ABC- 
type permease and an ABC-type ATP-binding protein; four cytochrome P450's; and three 
members of peptidase family S9A. In total, 85 proteins belong to families that are represented 
more than once and which do not seem to be linked to insertion or recombination. 

The majority (330, 79%) of the putative proteins are probably located in the cytoplasm 
of the bacterium, 62 (15%) possibly span membranes, 20 (5%) could be located in the 
periplasm, 3 are predicted to be lipoproteins that could associate with the outer membrane, and 
2 are probably oi^ter membrane proteins. These observations accord well with the dominance of 
biosynthetic protorts , as well as proteins involved in transcriptional regulation and 
insertion/recombination, most of which are thought to be cytoplasn[iic. 

Although other start points cannot be excluded, replication of pNGR234a probably 
begins at oriV which is located within the intergenic sequence (igs) between the repC and repB- 
like genes y4cl and y4cJ. This locus (positions 54,417 to 54,570) encodes three proteins with 
40-60% amino acid identities to RepABC of pTiB6S3 (a Ti-plasmid of Agrobacterium 
tumefaciens), pRiA4b (an Ri-plasmid of A. rhizogenes) and pRLSJI (a cryptic plasmid of R. 
leguminosarum bv. leguminosarum). Amongst replication regions, highest identities (69 to 
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71% at the nucleotide level) are found in the igs^s between repC and repB (Fig. 5). In 
Agrobacterium, these igs^s are the determinants which render parental plasmids incompatible. 
Two ORF's (position 198,5(X)), which are homologous to pseudomonal genes involved in 
plasmid stability, may also play a role in replication of pNGR234a. A 12 bp portion of the 
origin of transfer {oriT) is identical to that of pTiC58 of Agrobacterium tumefaciens (nt 80, 162 
to 80,173), and highly similar to those of RSFlOlO {Escherichia coli) and pTFl (Thiobacillus 
ferrooxidans). This sequence corresponds to the oriT of plasmids containing the "Q-type nick- 
region" (Fig. 6). 

Another 24 predicted ORPs show homologies to conjugal transfer genes of 
Agrobacterium Ti-plasmids. All are located in two large clusters between position 57,000 to 
83,000. Since pNGR234a was believed to be non-transmissible (Broughton et al, 1987), the 
fact that both the nucleotide sequence of the individual Ofkfz and their order is similar in 
Agrobacterium and NGR234 came as a surprise. Conjugal transfer of Ti plasmids in A. 
tumefaciens is controlled by a family of A^-acyl-L-homoserine lactone auto-inducers (Zhang et 
aL, 1993). Similar molecules, which are able to interact with the traR gene product of A. 
tumefaciens, were detected in the supernatants of NGR234 cultures using the assay of Piper et 
aZ, (1993). 

Reiterated sequences first became apparent in NGR234 during the construction of an 
ordered array of cosmid clones (Ferret et al., 1991). It is now clear that 97 kbp (18 %) of 
pNGR234a represents insertion- (IS) and mosaic- (MS) sequences (Fig. 7), Homology 
searches for known IS/MS revealed some of these, while comparison of repeated sequences 
within pNGR234a, as well as between the plasmid and 2,500 random chromosome sequences 
(V. Viprey, pers. communication) located the rest. Seventy five putative ORFs (18% of the 
total) and 40 fragments of ORFs were identified this way, nearly half of which (44) show 
homologies to integrases and transposases. Many of these IS elements are similar not only to 
those derived from Rhizobium and Agrobacterium species, but also to those of other, diverse 
Gram (-) and Gram (+) bacteria (e.g. Bacillus, Escherichia, and Pseudomonas). The shear 
number and diversity of these IS/MS elements suggests that NGR234 has functioned as a 
"transposon trap''. This is supported by the fact that their average G,C content (61.5 %) is 3 % 
higher than that of pNGR234a (58,5 %). Interestingly, many IS/MS are clustered between 
positions 300,000 to 390,000 (Fig. 7), while some loci are almost unaffected by insertions 
(oriV, nod-, fix- and nif-QAfs ). Small IS/MS clusters divide the replicon into large blocks of 
often functionally related ORFs (e.g. blocks of nod-Q^fs. replication and conjugal transfer 
Oftfo, nif-ORfs and fix-Ofif^)- A list of all sequences with IS-elment or mosaic sequence 
character is given in Tabled, Although transposition of these IS/MS elements has not been 
demonstrated, transfer of plasmids amongst rhizobia in the legume rhizosphere (Broughton et 
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diffcrenl ORFs derived from IS-like sequences; 
partially known as acc. no. X74068 ("Region2" from 
pNGR234fl); 164853-167086: 66% nt-id. to IS66 
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al, 1987) and to other non-symbiotic bacteria in fields (Sullivan et al, 1995) suggests that 
lateral transfer of genetic information has helped shape symbiotic potential. 

Carbohydrates are constituents of the rhizobial cell wall as well as morphogens called 
Nod-factors (short tri- to penta-mers of iV-acetyl-D-glucosamine, substituted at the non- 
reducing terminus with CI 6 to CI 8 saturated or partially unsaturated fatty acids). Elements of 
the biosynthetic pathways leading to cell walls or to lipo-chito-oligosaccharides (Nod-factors) 
are common. Most differences are found in the later stages of the pathways that lead to specific 
cell-wall components or to Nod-factors. 

As befits a symbiotic replicon, only 13 ORF's with homology to polysaccharide 
synthesis genes (house-keeping genes senso strictd) are located on the plasmid (Table 3). 
Sequences homologous to exoB, exoF, exoK, exoL, exoP, exoU, and exoX (X. Perret and V. 
Vipre'y, unpublished), and exoY (Gray et ai, 1990) are clearly located on the chromosome. 
Although loci with weak homologies to nod-hox::psiB of/?, leguminosarum, and exoX of R, 
meliloti exist on the plasmid (y4iR, and y4xQ respectively), these are regulatory rather than 
structural genes, suggesting that almost all cell wall polysaccharide synthesis ORfs are 
chromosomally located. 

Except for nodPQ and nodE, at least one copy of all the regulatory and structural ORfs 
involved in Nod-factor biosynthesis seem to be located on the plasmid. The activity of most 
nodulation genes is modulated by four transcriptional regulators of the lysR family. These are 
nodDl (y4aL), syrMl (y4pN), nodD2 (y4xH), and syrM2 (y4zF). NodC, which is an N- 
acetylglucosaminyltransferase, the first committed enzyme in the Nod-factor biosynthetic 
pathway, is part of an operon which includes nodABCIJnolOnoelE (y4hl to y4hB, Table 3), 
Together, these genes, which form the hsnlll locus, are responsible for the synthesis of the 
core Nod-factor molecule, and the adjunction of 3- (or 4)-0-carbamoyl, 2-O-methyU and 4-0- 
sulfate groups (Hanin et al, unpublished). nodZ (y4aH), which encodes a fucosyltransferase, 
is part of the hsnl locus, which includes noeJ (y4aJ), noeK (y4al), noeL (y4aG), nolK (y4aF), 
all of which are involved in the fucosylation of NodNGR factors (Fellay et al, 1995a). Wild- 
type NodNGR factors are also A^methylated and 6-O-carbamoylated, adjuncts which are added 
by the transferases encoded by nodS and nodU respectively [y4nC and y4nB; hsnll (Lewin et 
al, 1990)]. Possibly the only other enzyme which may be directly involved in Nod-factor 
biosynthesis is that encoded by nolL (y4eH, Table 3). As the 2-O-methylfucose residue of 
NGR234 Nod-factors is either 3-O-acetylated, or 4-O-sulphated, an acetyltransferase is 
obviously required. Since NolL shows only limited homology to acetyl transferases, 
experimental proof of the transferase activity will be required however. 

In contrast to /e. le guminosar urn Sind R. meliloti harbouring pNGR234a,A. 
tumefaciens(pNGR234a) transconjugants are incapable of nitrogen fixation (Broughton et al, 
1984). suggesting that some essential fix- ORfs are also carried by the chromosome. 



4-3 



Nevertheless, more than 40 nif- and fix-ORfs are plasmid bome. Included amongst these are 
nifA (y4uN) which encodes for a sigma-54 dependent regulator. Mutation of rpoN (which 
encodes sigma 54) causes a Fix' phenotype on NGR234 hosts (van Slooten et aL, 1990). 
Similarly, mutation of fixF (y4gN) disrupts synthesis of a rhamnose-rich extra-cellular 
polysaccharide, and results in a Fix" phenotype on Vigna unguiculata^ the reference host for 
NGR234 (unpublished). In fact, loci adjacent XofixF are probably responsible for the synthesis 
of dTDP-rhamnose from glucose- 1 -phosphate. Enzymes involved in this biosynthetic pathway 
include glucose- 1 -phosphate thymidylyltransferase (y4gH), dTDP-glucose-4,6-dehydratase 
(y4gF), dTDP-4-dehydrorhamnose-3,5-epimerase (y4gL), and dTDP-4-dehydrorhamnose 
reductase (y4gG). Rhamnose-rich lipopoly saccharides (LPS) seem to be necessary for 
complete bacteroid development and nitrogen fixation (Krishnan et al., 1995). Perhaps the 
enzyme encoded by y4gl is needed for the synthesis of the rhanmose rich LPS's from dTDP- 
rhamnose. 

Although not directly involved in the fixation process, mutation of the plasmid borne 
copy of dctA (= dctAl, y4vF) also impairs nitrogen fixation (van Slooten et aL, 1992). Other 
nif' and fix- Ofifs are involved in elaboration of the electron-transfer complex (fixAB), in 
various cofactors required for nitrogen fixation (e.g. fixC, nifB, nifE^ nifN, etc.), and in the 
synthesis of ferrodoxins (fdxB, fdxN, fixX), Finally, those OftFs involved in the synthesis of 
the nitrogenase complex are also present. Amongst these are two functional copies of the 
nifKDH Odfs (y4vM to y4vK and y4xC to y4xA) (Badenoch- Jones et al., 1989). 
Additionally, 17 new ORf s located within the nitrogen fixation cluster (see Fig, 7; OR Fs y4vC 
to y4vJ with the exception of dctAl, y4wA to y4wG, y4wl, y4wJ and y4xQ) are co- 
transcribed together with the ORfs homologous to known nif and fix genes. It thus seems 
likely that most OAfs necessary for bacteroid development and synthesis of the nitrogen-fixing 
complex, are carried by pNGR234a. 

Two types of regulatory elements which frequently occur in pNGR234a are the NodD- and 
Nif A/sigma-54-dependent promoters. NodD-dependent promoter-like sequences known as nod 
boxes have been identified by homology search within intergenic regions, using the following 
consensus sequence: 5'-YATCCAYNNYRYRGATGNNNNYNAT(3^AAACAATCRATT^ 
ACCAATCY-3' [12 mismatches allowed (van Rhijn and Vanderleyden, 1993); Y=C or T, R=A 
or G, N=A,C,G or T]. Putative NifA-dependent promoters (Fischer, 1994) have been 
predicted by screening for the NifA activator sequence (5'-TGT-Nio-ACA-3') together with the 
sigma-54 promoter consensus sequence (5'-TGGCAC-N5-TTGCA/T-3' with GG and GC as 
the most conserved doublets; 3 mismatches allowed) separated by 60 to 150 nucleotides. The 
identified conserved promoter-like sequences in pNGR234a are listed in Tables 5 and 6. 
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Tab. 5. nod box-like sequences in pNGR234a 





nod 
i_ ^ ^, 
box 


position in 
pNGR234a 


orien- 
tation 


number of distance to the 
mismatches to the following ORF 
consensus 
sequence 


name of the 
lOiiowmg UKr 




i 


4514 - 4562 


- 


11 


504 


(fal) 




2 


8481 - 8529 


- 


8 


87 


nodZ 




3 


12322 - 12370 


- 


7 




/# 




4 


97470 - 97518 


- 


6 


277 


nolL 




5 


129615 - 129663 


+ 


10 


1358 


y4gE 




o 






8 


890 






7 


150280- 150327 




11 


202 


noeE 




8 


158820 - 158868 




4 


235 


nodA 




9 


161891 - 161939 


+ 


11 


1103 


y4hM 




10 


169833 - 169881 




7 


117 


y4iR 




11 


278947 - 278995 




7 


153 


nods 




12 


" 279821 - 279869 


+ 


7 




?# 




13 


443101-443149 




10 


465 


y4vC 




14 


473059 - 473107 


+ 


9 


236 


y4wH 




15° 


481253 - 481301 




16 


117 


y4wM 




16 


493961 - 494009 


+ 


6 


288 


y4xl 




17 


532039 - 532087 


+ 


5 


589 


syrM2 




18 


256434 - 256482 


+ 


12 


329 


y4mC 




19 


469151 -469199 


+ 


12 


112 


y4wE 



° The majority of the mismatches is located in the 3'-tenninal part of the sequence. 
# No predicted ORF can be found downstream of the putative nod box. 
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Tab.6. Putative Nif A -dependent promoters in pNGR234fl 



iNr. 


INIIA-Qep. U/\o . 

posiuon 


f OA rt>oinnik\' 

^- 1 z/ "ZH- rcgiunw' ) . 
posiuon 




Hi^tanfip to the 
following ORP 

X\JU\J Yrlkk^ V-'AVl 


naiTic of the 
followins ORP 


1 


90812 - yOo27 


yUylx) - yuyz4 








z 


1^0797 - 1A'?74'? 
iuZ/Z/ - luZ/HZ 


1fi97RR - lfi9R09 




240 


y4hM 


3 


235036 - 235051 


234934 - 234948 




66 


y41D 


4 


255021 - 255036 


255130-255144 




306 


y4inB 


5 


285265 - 285280 


285343 - 285357 




50 


y4nG 


6 


436363 - 436378 


436275 - 436289 




41 


nifB 


7 


442046 - 442061 


441955 - 441969 




56 


fixA 


8 


442735 - 442750 


442676-442690 




40 


y4vC 


9 


444109 - 444124 


443983 - 443997 




104 


y4vD 


10 


444137 - 444152 


444241-444299° 


+ 


38° 


nifQ 


11 


451782 - 451799 


451891-451905 


+ 


88 


nifHl 


12 


460319 - 460334 


460424 - 460438 


+ 


63 


y4vR 


13 


•463063 - 463078 


463139 - 463153 


+ 


48 


y4wA 


14 


478839 - 478854 


478761-478775 




463 


nifS 


15° 


483663 - 483678 


483769 - 483783 


+ 


88 


nifH2 



* "Upstream Activator Sequence": NifA-binding site located 80 to 150 nt upstream of the 
transcription start point (5 -TGT-N lo- AC A-3'). 

# sequence corresponding to the consensus sequence of conserved sigma-54-promoters 12 nt 
upstream of the transcription start point: 5'-TGGCAC-N5-TTGC-3* (2 mismatches 



allowed). 

° 3 possibilities for a promoter (in two cases only corresponding to the minimal consens: 
GG-Nio-GC-3') 
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EXAMPLES 



GENERAL METHODS 

Bacteria and Plasmids 

Escherichia coli was grown on SOC, in TB or in two- 
fold YT mediiom (Sambrook et al., 1989). The cosmid clones 
pXB296 and pXBllO (Perret et al., 1991) were raised in 
E, coli strain 1046 (Cami and Kourilsky, 1978). subclones 
in M13mpl8 vectors (Yanisch-Perron et al., 1985) were grown 
in E. coli strain DHSaF'IQ (Hanahan, 1983). 

Construction of cosmid Libraries 

Cosmid DNA was prepared by standard alkaline lysis 
procedures followed by purification in CsCl gradients 
(Radloff et al., 1967) . DNA fragments sheared by sonication 
of 10 Mg of cosmid DNA were treated for 10 min at 30°C with 
30 units of mung bean nuclease (New England Biolabs, 
Beverly, MA, USA), extracted with phenol /chloroform (1:1), 
and precipitated with ethanol. DNA fragments, ranging in 
size from 1 to 1.4 kbp, were purified from agarose gels 
using Geneclean II (BiolOl, Vista, CA, USA) and ligated into 
5inal-digested M13pml8. Electroporation of aliquots of the 
ligation reaction into competent E. coli DHSaF'IQ was 
performed according to standard protocols (Dower et al., 
1988; Sambrook et al., 1989). 

M13 Template Preparation 

Fresh 1 ml coli cultures in twofold YT held in 96- 
deep-well microtiter plates (Beckman Instruments, Fullerton, 
OA, USA) were infected with recombinant phages from white 
plaques grown on plates containing X-gal (5-bromo-4-chloro- 
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indoyl-i9-D-galactoside) and IPTG ( isopropyl-j3- 
thiogalactopyranoside) . Rapid preparation of -0,5 iig of 
single-stranded M13 template DNA was carried out as follows: 
190 /il portions of the phage cultures grown for 6 hr at 37 '^C 
5 were transferred into 96-well microtiter plates. Lysis of 
the phages was obtained by adding 10 til of 15% (w/v) SDS 
followed by 5 min incubation at 80**C. Template DNA was 
trapped using 10 fil (1 mg) of paramagnetic beads 
(Streptavidin MagneSphere Paramagnetic Particles Plus M13 

10 Oligo, Promega, Madison, WI, USA) and 50 /xl of hybridization 
solution [2.5 M NaCl, 20% (w/v) polyethylene glycol (PEG- 
8000)] during an annealing step of 20 min at 45**C. Beads 
were pelleted by placing microtiter plates on appropriate 
magnets and washing three times with 100 /xl of 0.1-fold SSC. 

15 The DNA was recovered in 20 /il of water by a denaturation 
step of 3 min at BO^'c. When required, larger amounts of 
single-stranded recombinant DNA (>10Mg) were purified using 
QIAprep 8 M13 Purification Kits (Qiagen, Hilden, Germany) 
from 3 ml of supernatant of phage cultures grown for 6 hr at 

20 37*^0. 

Sequencing 

Two sequencing methods were used: dye terminator and 
25 dye primer cycle sequencing, each in combination with 
AmpliTaq DNA polymerase (Perkin-Elmer ) and Thermo Sequenase 
(Amersham) . All reactions, including ethanol precipitation, 
were performed in microtiter plates. Reagents were pipetted 
using 12-channel pipettes. Where necessary, sequencing 
30 reaction mixtures, including enzymes, were pipetted into the 
plates in advance and held at -20'*C until needed. 

Dye Terminator cycle sequencing 

35 For dye terminator /AmpliTaq DNA polymerase sequencing, 

0.5 tig of template DNA, and the PRISM Ready Reaction 
DyeDeoxy Terminator Cycle Sequencing Kit (Perkin-Elmer) were 
used. Cycle sequencing was performed in microtiter plates 
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using 25 PGR cycles (30 sec at 95^C, 30 sec at 50^C, and 4 
min at 60 ""C). Prior to loading the amplified products on 
electrophoresis gels, unreacted dye terminators were removed 
using Sephadex columns scaled down to microtiter plates 
5 (Rosenthal and Charnock- Jones , 1993) . 

Dye terminator/Thermo Sequenase sequencing was 
performed using the same experimental conditions except that 
the reaction mix contained 16.25 mM Tris-HCl (pH 9.5), 

10 4.0 mM MgCl2, 0.02% (v/v) NP-40, 0.02% (v/v) Tween 20, 42 /xM 
2-mercaptoethanol, 100 /xM dATP/dCTP/dTTP, 300 mM dITP, 0.017 
MM A/0.137 /iM C/0.009 /xM G/0.183 juM T from Taq Dye 
Terminators (Perkin-Elmer ; no. A5F034) , 0.67 /xM primer, 
0.2-0.5 /xg of template DNA, and 10 units of Thermo 

15 Sequenase (Amersham) in a 30 /xl reaction volume. 
Unincorporated dye terminators were removed from reaction 
mixtures by precipitation with ethanol. 

Dye Primer Cycle Sequencing 

20 

Dye primer/ Amp liTaq DNA polymerase sequencing 
reactions were performed according to the instructions 
accompanying the Taq Dye Primer, 21M13 Kit (Perkin-Elmer) . 
Cycle sequencing was carried out on 0.5 /xg of template DNA 
25 with 19 PGR cycles (30 sec at 95*C, 30 sec at 50^*0, and 90 
sec at 72**C) followed by six cycles, each consisting of 95*^0 
for 30 sec and 72**C for 2.5 min. Prior to electrophoresis, 
the four base-specific reactions were pooled and 
precipitated with ethanol. 

30 

Identical PGR conditions and the Thermo Sequenase 
Fluorescent Labelled Primer Cycle Sequencing Kit (Amersham) 
were used for dye primer/Thermo Sequenase sequencing 
reactions. 

35 

Sequence Acquisition and Analysis 



Gel electrophoresis and automatic data collection were 
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performed with ABI 373A DNA sequencers (Perkin-Elmer) . 
After removing cosmid vector and M13iapl8 sequences from the 
shotgun sequence data, the data were assembled using the 
program XGAP {Dear and Staden, 1991) and edited against the 
5 fluorescent traces. To close remaining gaps, to make 
single-stranded regions double-stranded, and to clarify 
ambiguities, additional cycle sequencing reactions with 
selected shotgun templates were carried out using either 
custom-made primers (primer-walks) or universal primer. 

10 

The complete double-stranded DNA sequence of cosmid 
pXB296 was analyzed using programs from the Wisconsin 
Sequence Analysis Package (version 8, Genetics Computer 
Group, Madison, WI, USA). Homology searches were performed 

15 with BLAST (version 1.4; Altschul et al . ^ 1990) and FASTA 
(version 2.0; Pearson and Lipman, 1988). Several nucleotide 
and protein databases were screened (GenBank/Genpept, 
SwissProt, EMBL, and PIR) . Identities and similarities 
between homologous amino acid sequences were calculated with 

20 the alignment program BESTFIT (Smith and Waterman, 1981) . 

Exam ple 2 

25 Comparison of Fluorescent Traces Created by Different Cycle 
Sequencing Methods 

When using a thermostable sequenase [Thermo Sequenase 
(Amersham) ] , the concentrations of dye terminators (Perkin- 
30 Elmer) can be reduced by 20- to 250-fold in comparison to 
the concentrations needed for Taq DNA polymerase without 
compromising the quality of the sequencing results (Table 
7). 

35 To compare the dye terminator and dye primer cycle 

sequencing procedures, representative templates derived from 
the pXB296 library were sequenced by both methods, each 
performed with Thermo Sequenase and Tag DNA polymerase 
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(Figure i) . In general, dye tenninator traces do not: 
contain the many compressions (on average, one compression 
every 50 bases in single reads) that are common with dye 
primers if mixes do not contain nucleotide analogues like 
5 deoxyinosine or T-deaza-deoxyguanosine triphosphates or if 
sequencers are used without active heating systems. In 
addition, dye terminator traces obtained with Thermo 
Sequenase show more uniform signal intensities over those 
obtained with Tag DNA polymerase, thus resulting in a 

10 reduced number of weak and missing peaks (e.g. a weak G- 
signal following an A-signal in Thermo Sequenase traces or 
a weak C-signal following a G-signal in Tag DNA polymerase 
traces). Using ABI 373A sequencers, errors in automatic 
base-calling of Thermo Sequenase/dye terminator scans only 

15 arise after 300 - 350 bases. The average number of resolved 
bases in dye primer gels (378 bases) is 46 bases longer than 
in those produced with dye terminators (332 bases) . 
Furthermore, in Thermo Sequenase/dye primer sequences the 
peaks are very regular and the number of stops and missing 

20 bases decreases in comparison to Taq DNA polymerase /dye 
primer electropherograms. The number of compressions, 
however, is not significantly reduced. 

25 Example 3 

Shotgun Sequencing of Entire Cosmids Using Dye Terminators 
or Dye Primers 

30 To compare the efficiency of both methods, cosmid 

pXB296 of pNGR234a was shotgun sequenced using a combination 
of dye terminators and thermostable sequenase (Thermo 
Sequenase), whereas another cosmid, pXBllO, was sequenced 
using a combination of dye primers and Tag DNA polymerase 

35 (Table 1) . Over 93% (736 clones) of 786 dye terminator 
reads of pXB296 were accepted by XGAP with a maximal 
alignment mismatch of 4%. By increasing this level to 25%, 
so that most of the remaining data could be included in the 
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assembly, 775 reads led to three 6 to 10 kbp stretches of 
contiguous sequence (contigs) , two of which were joined 
after editing. To close the last gap and to complete 
single-stranded regions with data derived from the opposite 
5 strand, only 32 additional dye terminator reads using 
custom-made primers were required. It took <l week to 
assemble and finalize the 34,010 bp DNA sequence of pXB296 
(EMBL accession no. Z68203; eight-fold redundancy; GC 
content,. 58.5 mol%) . 

10 

In contrast, only 308 (34%) of 899 shotgun reads 
obtained by Tag DNA polymerase/ dye primer cycle sequencing 
of pXBllO were included in the first assembly (4% alignment 
mismatch). At the 25% alignment mismatch level, 879 reads 

15 were assembled, leading to 25 short contigs (1-2 kbp). 
These contigs had to be edited extensively in order to join 
most of them. "Primer walks", covering gaps and 
complementing single-stranded regions, were not sufficient 
to clarify all the remaining ambiguities in the assembled 

20 sequence. Every 100 - 150 bp, a compression in one strand 
could not be resolved by sequence data from the 
complementary strand. Therefore, it was necessary to 
resequence clones using dye terminators and universal 
primer. In total, 191 additional dye terminator reads had 

25 to be created. As a result, assembling and finalizing the 
34,573 bp sequence of pXBllO (10.5-fold redundancy; GC 
content, 58.3 mol%) took much more time than pXB296 did. 

30 Example 4 

Analysis of Cosmid pXB296 

Putative ORFs were located on the 34,010 bp sequence 
35 of pXB296 using the programs TESTCODE (Fickett, 1982) and 
CODONPREFERENCE (Gribskov et al . , 1984), the latter in 
combination with a codon frequency table based on previously 
sequenced genes of Rhizobium sp. NGR234 (as well as the 
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Closely related JR. frodii) ♦ All 28 ORFs and their deduced 
amino acid sequences exhibited significant homologies to 
known genes and/or proteins. The positions of the ORFs 
along pXB296, as well as the best homologues, are displayed 
5 in Table 2 and Figure 2. Ribosomal binding site-like 
sequences (Shine and Dalgarno, 1974) precede each putative 
ORF except for ORFS (position 11,214 - 12,455). If one 
disregards the homology to known glutamate dehydrogenases in 
the first 32 amino acids deduced from this ORF, a downstream 

10 alternative start codon (position 11,220) preceded by a 
Shine-Dalgarno sequence can be identified. Most of the ORFs 
are organised in five clusters (ORFs with only short 
intergenic spaces or overlaps between them) . Cluster I , 
containing ORFl to ORFS, encodes proteins homologous to 

15 trans-membrane and membrane-associated oligopeptide permease 
proteins and to a Bacillus anthracis encapsulation protein. 
Cluster II, includes ORF6 and ORF7, which are homologous to 
aminotransferase and (semi) aldehyde dehydrogenase genes. 
Homologies to transposase genes [ORFS; cluster III (ORFlO 

20 and ORFll) ] and to various n±f and fix genes [cluster IV 
(0RF12 to ORF20) ; ORF23, part of cluster V] are also 
reported. 

Presumed promoter and stem-loop sequences that might 
25 represent p-independent terminator-like structures (Piatt, 
1986) are shown in Figure 2. Significant a^*-dependent 
promoter consensus sequences (5 ' -TGGCACG-N^-TTGC-3 • ; Morett 
and Buck, 1989) , as well as nifA upstream activator 
sequences (5 » -TGT-N^^-ACA-3 • ; Morett and Buck, 1988), are 
30 found upstream of the nifB homologue 0RF15, the fixA 
homologue ORF20, 0RF21, 0RF22, and ORF23. ORF23 is part of 
cluster V in pXB296, which includes the dctA gene of 
Mnizobium sp. NGR234 (van Slooten et al., 1992). 
Surprisingly, the published dctA sequence shows important 
35 discrepancies. Therefore, a fragment encompassing this 
locus was amplified by PGR using NGR234 genomic DNA as 
template. By sequencing this fragment, the cosmid sequence 
of the present invention was confirmed. 
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Analysis of the Complete Plasmid pNGR234a 

5 

Using the thermostable sequenase/dye terminator cycle 
sequencing method herein described, 20 overlapping cosmids 
(including pXB296) of the symbiotic plasmid pNGR234a of 
Rhizobium sp. NGR234 were sequenced, together with two PCR 

10 products and a subcloned DNA fragment derived from cosmid 
pXB564 that cover two remaining gaps (position 276,448 - 
277,944 and position 480,607 - 483,991). The map of the 
sequenced cosmids is shown in Figure 4. The entire 
assembled 536 kb sequence of pNGR234a is given in Figure 3 

15 (deposited in EMBL/GenBank under accession no. U00090) • 

The analysis of the complete nucleotide sequence 
revealed few regions of 98 - 100% identity to already 
published sequences in public databases. These sequences 

20 are listed in Table 8. These sequences had been derived 
either from Rhizobium sp. NGR234, derivatives of it or 
closely related strains of it. Therefore, the ORFs and 
their deduced proteins, 98 - 100% homologous to nifHf nodA, 
nods, nodCf nodDl, nodS, nodU, nolX, nolW, nolB, nolU and 

25 "ORFl", represent already known genes/proteins (Table 8 and 
References) . Some other ORFs and their deduced proteins, 
nearly identical to public database entries, were either 
only partially known before the disclosure of the present 
invention or exhibited significant differences, for 

30 instance, dctA, host- inducible gene A, nifD^ nifK, nodD2 ^ 
nolT, nolX, nolV, "ORF140", "0RF91", "RSRS9 25 kDa-protein 
gene" (Table 8 and References) . 

As a first step, approximately 100 kb of pNGR234a was 
35 analyzed between position 417,796 to 517,279 using the 
programs TESTCODE (Fickett, 1982) and CODONPREFERENCE 
(Gribskov et al., 1984). In this initial -100 kb of 
sequence, 76 ORFs were found and ascribed putative functions 
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(= ORFs y4tQ to y4yO (excluding ORFs y4uD, y4uG, y4wG, y4wO, 
y4wP, y4xF, y4xQ, y4xG and y4yB and excluding ORE- fragments 
ful, fu2, fu3, fu4, fvl and fwl) ; see Table 3), It should 
be noted that since the sequence of cosmid pXB296 forms part 
5 of this 100 kb region, all of the ORFs identified in Table 
2 (except "ORFl") are reproduced (albeit with minor, but 
definitive, revisions) in Table 3. Most of the 76 ORFs and 
their deduced proteins showed homologies to public database 
entries that could help identify their putative functions. 
10 Only ORFs y4vK and y4xA (duplicated nifH) as well as y4yD, 
y4yE and y4yG (nolW, nolB and nolU) were identical to 
database entries (98 - 100% homology) . In the case of 7 
ORFs and their deduced proteins, no homologous sequences in 
public databases have been found. 

15 

As a second step, the remaining 436 kb of pNGR234a 
were analyzed using the methods noted above. The results of 
this analysis are discussed in Example 6. 

20 

Example 6 

Genetic Organization of the Complete Plasmid pNGR234a 

25 In order to confirm and to improve the identification 

of probable coding regions in pNGR234a, the program GeneMark 
was used which is based on matrices developed for related 
organisms of Rhizobium sp. NGR234 (R. leguminosarum and 
R. meliloti (Borodovsky et al . , 1994)), The use of this 

30 program currently represents the most frequently applied 
method to distinguish coding and non-coding regions in newly 
sequences DNA of prokaryotes. Further analysis of the 
putative ORF products was carried out using methods to 
detect signal sequences, transmembrane segments and various 

35 other domains (PROSITE database search (Bairoch et al • , 
1995); PSORT program (Nakai et al . , 1991)). 

In total, 416 ORFs were predicted to encode putative 
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proteins (Freiberg et al., 1997) • Additionally, 67 
fragments were detected that seemed to be remnants of 
functional ORFs. Some of these were disrupted by insertion 
of mobile elements. All identified functional ORFs and 
5 fragments of former functional ORFs are listed in Table 3. 

Within the initial -100 kb region (position 417,796 
to 517,279) first analyzed in this study, 9 ORFs (y4uD, 
y4uG, y4wG, y4wO, y4wP, y4xF, y4xQ, y4xG and y4yB) and 6 
10 ORF-fragments (ful, f u2 , f u3 , f u4 , fvl and fwl) were 
predicted in addition to the 76 ORFs (y4tQ to y4y0) listed 
within Table 3. 

According to Table 8, 12 ORFs of the 416 predicted 
15 coding regions were identical to public database entries 
(98% to 100% homology at the amino acid level) , namely: y4hl 
(nodA) , y4hH (nodB) , y4hG (nodC) , y4aL (nodDl) , y4nC (nodS) , 
y4nB (nodU) , y4sM (ORFl) , y4vK (nifHl), y4xA (nifH2) , y4yD 
(noIJV) , y4yE (nolB) , y4yG (nolU) . In addition, the database 
20 entry of the homologue to y4yC (nolX) has been corrected to 
98% identical to y4yC, Furthermore, the sequence of the ORF 
y4hB (noeE) has been available to the public since October 
1996. Except the 14 ORFs mentioned above, the remaining 402 
ORFs are new, 139 of them show no homology to any known 
25 ORF/protein. The others exhibit less than 98% amino acid 
identity to public database entries over their whole length. 
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INDUSTRIAL APPLICABILITY 

The present invention provides a detailed analysis of 
the symbiotic plasmid pNGR234a of Rhlzobium sp* NGR234, The 
plasmid pNGR234a (including any ORFs encoded therein, or any 
part of the nucleotide sequence of the plasmid, or any 
proteins expressible from any of said ORFs or any part of 
said nucleotide sequence) has industrial applicability which 
can include its use in, inter alia, the following areas: 

(a) the analysis of the structure, organisation or 
dynamics of other genomes; 

(b) the screening, subcloning, or amplification by 
PGR of nucleotide sequences; 

(c) gene trapping; 

(d) the identification and classification of 
organisms and their genetic information; 

(e) the identification and characterisation of 
nucleotide sequences, amino acid sequences or 
proteins ; 

(f) the transportation of compounds to and from an 
organism which is host to at least to one of said 
nucleotide sequences, ORFs or proteins; 

(g) the degradation and/or metabolism of organic, 
inorganic, natural or xenobiotic substances in 
a host organism; 

(h) the modification of the host-range, nitrogen 
fixation abilities, fitness or competitiveness 
of organisms; 

(i) obtaining a synthetic minimal set of ORFs 
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required for functional KhizoJbiuin- legume 
syiabiosis; 

the modification of the host-range of rhizobia; 

the augmentation of the fitness or 
competitiveness of Rhizobium sp. NGR234 in the 
soil and its nodulation efficiency on host 
plants ; 

the introduction of desired phenotype(s) into 
host plants using said plasmid as a stable 
shuttle system for foreign DNA encoding said 
desired phenotype(s) ; or 

the direct transfer of said plasmid into rhizobia 
or other microorganisms without using other 
vectors for mobilization. 
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